This folder contains the R code for the model to generate the marginal damages for emissions of each species under different CRFs and baseline data (baseline ambient PM2.5 levels and baseline mortality).

It contains 4 sub-directories:
"Code" : containing the model code
"Inputs" : containing model inputs (see section 1 below)
"Results" : containing model results
"InMAP_to_County_Population_Mapping" : containing additional Code, Inputs, Outputs, and Documentation, for the population mapping of InMAP cells to U.S. counties (see section 1.f below)

There is also a separate documentation file for the Outputs, i.e. for the datasets containing the results of the Marginal Damages Model. This additional file is located at:
Marginal_Damages_Model/Results/Documentation_Marginal_Damages_Model_Results_ReadMe.txt 

We generate two sets of results:
(i) Marginal damages per tonne of ground-level emissions (Marginal Values -- MV), for each county. 
(ii) The source-receptor matrix (SRM) for marginal damages -- these are matrices MI described in Eq. 2 of the manuscript. They show the marginal damages occurring in each county as a consequence of a ground-level emission of 1 tonne in each county. The rows are the sources and the columns are the receptors, so that the row sums for these matrix are equivalent to the total marginal damages (i).

For both (i) and (ii), the damages are calculated for a combination of 80 different pollutant species, CRFs, Baseline ambient PM2.5 levels, and Mortality data. These are:

* 5 pollutant species: (Primary PM2.5, SO2, NOX, NH3, and VOC
* 4 CRFs: GEMM (Burnett et al., 2018), Vodonos et al. (2018) Parametric, Vodonos et al. (2018) Spline, and Krewski et al. (2009).
* 2 Baseline ambient PM2.5 levels: 2008 and 2017
* 2 Baseline Mortality Data: 2008 and 2017

***ALL values are monetized damages (2017 USD) caused by emissions of 1 tonne (metric ton, or 10^6 grams) of a given species (depending on the dataset), in each source.

***The counties (rows in the MV files; both rows and columns in the SRM files) are presented in the same order of the Results_With_Emissions/Inputs/Auxiliary/STCOUList.csv file

Note: Running the marginal damages model code on the FAS Cluster took about one hour of computing time on a single core, but required ~80 GB of memory/RAM. It also requires ISRM file (see 1.1), which is 165 GB in size.



-----

1. Model Inputs:
We provide all model inputs required to run the model, except for the InMAP Source-Receptor Matrix (ISRM) and the coefficients from the Vodonos et al. (2018). These can be downloaded as described in section 1.1.


1.1. Model Inputs that are not provided and need to be downloaded 

a) ISRM
The ISRM is from the Goodkind et al. (2019a) study
This dataset is available at:
https://dx.doi.org/10.5281/zenodo.3590127
Accessed on: March 03, 2021
File: isrm_v1.2.1.zip
Version: 1.2.1 (March 11, 2019)
Size (.zip): 12.5 GB
Note: after unzipping, the file is 165.1 GB in size.

b) Vodonos Coefficients
This dataset is for the Vodonos et al. (2018) study.
This dataset is available at:
https://github.com/AlinaVod/meta_regression_pm2.5
Accessed on: March 09, 2021
Version: May 29, 2020
Size (.zip, all files): 352K


1.2. Model inputs that are provided with the code:
Note: all model inputs provided with the code (c-f below) are provided in the file Marginal_Damages_Model/Inputs/
Model_Input.RData. 
This file contains 7 objects:
c) Ambient.PM25
d.i) Deaths.AllCause.2008
d.ii) Deaths.AllCause.2017
d.iii) Deaths.AllCause.2008
d.iv) Deaths.AllCause.2017
e) GEMM.Coefficients
f) InMAP.Cells.by.County


c) Ambient Baseline PM2.5 data
Objects in Model_Inputs.RData file: Ambient.PM25
Description: dataframe/matrix of 3,108 rows and 3 columns
The 3,108 rows represent the 3,108 counties
The columns are:
1: "STCOU" : FIPS state and county code
2: "Ambient.PM25.2008" : Estimate of ambient PM2.5 levels in 2008 (population-weighted) [ug/m3]
3: "Ambient.PM25.2017" : Estimate of ambient PM2.5 levels in 2017 (population-weighted) [ug/m3]
Source: These ambient levels were generated using the 1-km resolution estimates by Di et al. (2019). Di et al. (2019) include estimates only until 2016, so we used that year (2016) as a proxy for 2017 concentrations. We produced county level estimates weighting by population at a Census block level using data from the 2010 Decennial Census (U.S. Census Bureau, 2011). We assigned a concentration to each Census Block by using the weighted average of the nearest 4 cell centroids from Di et al. (2019) and weighting by inverse distance. We then aggregated the Census Blocks to counties.


d) Baseline Mortality data
Four objects in Model_Inputs.RData file: 
Deaths.AllCause.2008
Deaths.AllCause.2017
Deaths.GEMM.2008
Deaths.GEMM.2017

Description: Four dataframes/matrices:
d.i) Deaths.AllCause.2008 and 
d.ii) Deaths.AllCause.2017
These two objects contain 3,108 rows (one for each county) and 2 columns:
1: "STCOU" : FIPS state and county code
2: "Deaths.[Year]" : Estimate of number of deaths for the year in question (2008 or 2017), from all causes

Source for baseline mortality data: Centers for Disease Control and Prevention (2020) and U.S. Department of Health and Human Services (2020). The directory "Preprocessing_of_Model_Input_Data" has documentation and code used to process the original data from the Centers for Disease Control and Prevention (2020) and the U.S. Department of Health and Human Services (2020), to create the dataset exactly in the format used here.

d.iii) Deaths.GEMM.2008 and 
d.iv) Deaths.GEMM.2017
These two objects contain 3,108 rows (one for each county) and 13 columns:
1: "STCOU" : FIPS state and county code
2-13: "Deaths.[Year].[Age Group]" : Estimate of number of deaths from non-accidental causes for the year in question (2008 or 2017), for the age group in question. Each of the two files has data for one of the years, and the twelve age groups follow the GEMM 5-year age groups. In order, the age groups are:
2: "27.5" (25-29 years old)
3: "32.5" (30-34 years old)
4: "37.5" (35-39 years old)
5: "42.5" (40-44 years old)
6: "47.5" (45-49 years old)
7: "52.5" (50-54 years old)
8: "57.5" (55-59 years old)
9: "62.5" (60-64 years old)
10: "67.5" (65-69 years old)
11: "72.5" (70-74 years old)
12: "77.5" (75-79 years old)
13: "85" (80+ years old)

Source for baseline mortality data: Centers for Disease Control and Prevention (2020) and U.S. Department of Health and Human Services (2020). The directory "Preprocessing_of_Model_Input_Data" has documentation and code used to process the original data from the Centers for Disease Control and Prevention (2020) and the U.S. Department of Health and Human Services (2020), to create the dataset exactly in the format used here.

e) GEMM Coefficients 
Object in Model_Inputs.RData file: GEMM.Coefficients
A dataframe/matrix of 13 rows and 7 columns containing the GEMM coefficients from Burnett et al. (2018). We use the non-accidental GEMM (GEMM NCD+LRI) and the coefficients including a recent Chinese male cohort. 
The 13 rows represent the 13 age groups -- rows 1-12 are the twelve age groups listed in section d.ii) above, whereas the 13th row is for all adults over 25 years old (we do not use the coefficients for all adults in our model)
The 7 columns are:
1: "Cause" : Cause of death (NCD+LRI)
2: "age" : age group -- as described above in section d.ii) above, plus 25.0 for all adults
3-7 are the model parameters given by Burnett et al. (2018):
3: "th_w" : theta
4: "SE_th_w" : standard error for theta (we only use the mean value in our model)
5: "al_w": alpha
6: "mu_w" : mu
7: "v_w" : nu
Source: These coefficients come directly from Burnett et al. (2018)

f) Population-weighted mapping of InMAP cells to counties
Object in Model_Inputs.RData file: InMAP.Cells.by.County
This is a large matrix (3,108 x 52,411), where the 3,108 rows are the 3,108 counties and the 52,411 columns are the 52,411 InMAP cells (Goodkind et al., 2019a). Element Pij of this matrix (ith row and jth column) contains the percentage of population of county "i" in InMAP cell "j", i.e. 
Pij = Pj_in_i/Pi 

where Pj_in_i is the InMAP cell population that is located within the boundaries of county i; and Pi is the total population of county i.

Source: We create this population mapping combining population data from the ISRM (Goodkind et al., 2019a, 2019b) and from the U.S. Census Bureau (2019a, 2019b). The U.S. Census Bureau (2019a) are 2015-2019 5-year population estimates from the American Community Survey (ACS), on a block group level. We combine it with the block group shapefiles from U.S. Census Bureau (2019b). A description of the method used is available in the Supplementary Information file of the paper. 

We also provide a separate R code which can be used to generate this input.
The R code file is: Marginal_Damages_Model/InMAP_to_County_Population_Mapping/Code/Population_Mapping_Script.R
This code requires additional data downloads to be executed. These are described in a separate documentation file: Marginal_Damages_Model/InMAP_to_County_Population_Mapping/Population_Mapping_Documentation.txt
We also note that this Population_Mapping_Code.R code took about 3 days of computing time on the Harvard University FASRC Odissey Cluster (using a single core). 


References: 

Burnett, R., Chen, H., Szyszkowicz, M., Fann, N., Hubbell, B., Pope, C.A., …, Spadaro, J.V., 2018. Global estimates of mortality associated with long-term exposure to outdoor fine particulate matter. Proc. of the Natl. Acad. of Sci. of the U. S. A. 115, 9592-9597. https://doi.org/10.1073/pnas.1803222115.

[dataset] Centers for Disease Control and Prevention, 2020. Underlying Cause of Death 1999-2018 on CDC WONDER Online Database, released in 2020. Centers for Disease Control and Prevention, National Center for Health Statistics. Data are from the Multiple Cause of Death Files, 1999-2018, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. http://wonder.cdc.gov/ucd-icd10.html (accessed 30 November 2020).

Di, Q., Amini, H., Shi, L., Kloog, I., Kelly, J., Sabath, M.B., …, Schwartz, J., 2019. An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environment International 130, 104909. https://doi.org/10.1016/j.envint.2019.104909.

Goodkind, A.L., Tessum, C.W., Coggings, J.S., Hill, J.D., Marshall, J.D., 2019a. Fine-scale damage estimates of particulate matter air pollution reveal opportunities for location-specific mitigation of emissions. Proc. of the Natl. Acad. of Sci. of the U. S. A 116, 8775-8780. https://doi.org/10.1073/pnas.1816102116.

[dataset] Goodkind, A.L., Tessum, C.W., Coggings, J.S., Hill J.D., Marshall, J.D., 2019b. Data from “InMAP Source-Receptor Matrix (ISRM) dataset”. Zenodo. https://dx.doi.org/10.5281/zenodo.3590127 (accessed 03 March 2021). Deposited on 11 March 2019.

Krewski, D., Jerrett, M., Burnett, R.T., Ma, R., Hughes, E., Shi, Y., …, Thun, M.J., 2009. Extended Follow-Up and Spatial Analysis of the American Cancer Society Study Linking Particulate Air Pollution and Mortality. Health Effects Institute, Boston MA: HEI Research Report 140. https://www.healtheffects.org/system/files/Krewski140.pdf (accessed 22 April 2020).

[dataset] U.S. Census Bureau, 2011. Census TIGER/Line Shapefiles. https://www2.census.gov/geo/tiger/TIGER2010BLKPOPHU/ (accessed 11 September 2020).

[dataset] U.S. Census Bureau, 2019a. 2015-2019 American Community Survey 5-Year Estimates: variable B1003 (TOTAL POPULATION). https://data.census.gov/cedsci/advanced (accessed 21 February 2021).

[dataset] U.S. Census Bureau, 2019b. Census TIGER/Line Shapefiles. https://www2.census.gov/geo/tiger/TIGER2019/BG/ (accessed 05 March 2021).

[dataset] U.S. Department of Health and Human Services, 2020. United States Department of Health and Human Services (US DHHS), Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS), Bridged-Race Population Estimates, United States July 1st resident population by state, county, age, sex, bridged-race, and Hispanic origin. Compiled from 1990-1999 bridged-race intercensal population estimates (released by NCHS on 7/26/2004); revised bridged-race 2000-2009 intercensal population estimates (released by NCHS on 10/26/2012); and bridged-race Vintage 2019 (2010-2019) postcensal population estimates (released by NCHS on 7/9/2020). Available on CDC WONDER Online Database. http://wonder.cdc.gov/bridged-race-v2019.html (accessed 30 November 2020).

Vodonos, A, Abu Awad, Y., Schwartz, J., 2018. The concentration-response between long-term PM2.5 exposure and mortality; A meta-regression approach. Environ. Res. 166, 677-689. https://doi.org/10.1016/j.envres.2018.06.021.
